-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[improve](hive) Refactor csv reader #50379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
6843dd1
to
34968cc
Compare
run buildall |
1 similar comment
run buildall |
TPC-H: Total hot run time: 34089 ms
|
TPC-DS: Total hot run time: 184641 ms
|
ClickBench: Total hot run time: 29.18 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression P0 && UT Coverage ReportIncrement line coverage Increment coverage report
|
run external |
1 similar comment
run external |
run buildall |
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 34204 ms
|
TPC-DS: Total hot run time: 185210 ms
|
ClickBench: Total hot run time: 29.1 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression P0 && UT Coverage ReportIncrement line coverage Increment coverage report
|
run buildall |
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 33981 ms
|
TPC-DS: Total hot run time: 192605 ms
|
ClickBench: Total hot run time: 29.92 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression P0 && UT Coverage ReportIncrement line coverage Increment coverage report
|
run buildall |
a2955ec
to
1fe74eb
Compare
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 34120 ms
|
TPC-DS: Total hot run time: 193278 ms
|
ClickBench: Total hot run time: 29.76 s
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 33678 ms
|
TPC-DS: Total hot run time: 188534 ms
|
ClickBench: Total hot run time: 29.74 s
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 33865 ms
|
TPC-DS: Total hot run time: 186043 ms
|
ClickBench: Total hot run time: 28.58 s
|
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
- [x] Impl TextReader for reading hive text table. - [x] Cleanup CsvReader - [x] Add cases to test csv and text format behavior changes: 1. Don't parse "\\N" or other string as null for hive OpenCsv table In Hive's OpenCsv table, there is no definition for the representation of null values. This behavior is different from the behavior of importing CSV tables internally in Doris. 2. Fix the bug when reading complex type for hive OpenCsv table The current code incorrectly uses the hive text format to parse the complex types of Hive OpenCsv tables. This PR fixes this behavior and uses the json format for parsing.
What problem does this PR solve?
Release note
behavior changes:
In Hive's OpenCsv table, there is no definition for the representation of null values. This behavior is different from the behavior of importing CSV tables internally in Doris.
The current code incorrectly uses the hive text format to parse the complex types of Hive OpenCsv tables. This PR fixes this behavior and uses the json format for parsing.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)